<!DOCTYPE HTML>
<!--
	Dimension by HTML5 UP
	html5up.net | @ajlkn
	Free for personal and commercial use under the CCA 3.0 license (html5up.net/license)
-->
<html>
 <head>
  <title>
   Dimension by HTML5 UP
  </title>
  <!-- <meta charset="utf-8" /> -->
  <!-- <meta name="viewport" content="width=device-width, initial-scale=1, user-scalable=no" /> -->
  <meta charset="utf-8"/>
  <meta content="width=device-width,initial-scale=1.0" name="viewport"/>
  <link href="../../assets/css/article.css" rel="stylesheet"/>
  <link href="https://cdn.bootcss.com/highlight.js/9.15.8/styles/github.min.css" rel="stylesheet"/>
  <noscript>
   <link href="../../assets/css/noscript.css" rel="stylesheet"/>
  </noscript>
 </head>
 <body>
  <div id="app">
  </div>
  <!-- built files will be auto injected -->
 </body>
 <body class="is-preload">
  <!-- Wrapper -->
  <div id="wrapper">
   <!-- Main -->
   <div id="main">
    <article id="article">
     <h1 id="id3c45">
      决策树ID3与C4.5
     </h1>
     <hr/>
     <p>
      <em>
       参考极客时间&lt;数据分析实战45讲&gt;17章
      </em>
     </p>
     <h2 id="_1">
      决策树的生成
     </h2>
     <p>
      决策树的生成会经历两个阶段:构造和剪枝
     </p>
     <h3 id="_2">
      构造
     </h3>
     <p>
      构造就是生成一颗完整的决策树.简单来说,构造的过程就是选择什么属性作为节点的过程,在构造过程中会存在三种节点:
     </p>
     <ul>
      <li>
       根节点:决策树最开始的节点
      </li>
      <li>
       内部节点:决策树中间的节点
      </li>
      <li>
       叶节点:决策树最底部的节点
      </li>
     </ul>
     <p>
      在构造的过程中需要解决三个重要的问题:
     </p>
     <ul>
      <li>
       1.选择哪个属性作为根节点
      </li>
      <li>
       2.选择哪些属性作为子节点
      </li>
      <li>
       3.什么时候停止并得到目标节点状态,即叶节点
      </li>
     </ul>
     <h3 id="_3">
      剪枝
     </h3>
     <p>
      剪枝就是给决策树瘦身,使不需要太多的判断的同时能得到不错的结果,防止过拟合现象的发生.剪枝又分为预剪枝和后剪枝.
     </p>
     <p>
      预剪枝是在决策树构造时就进行剪枝.方法是在构造的过程中对节点进行评估,如果对某个节点进行划分,在验证集中不能带来准确性的提升,那么对这个节点进行划分就没有意义,这时就会把当前节点当做叶节点,不对其进行划分.
     </p>
     <p>
      后剪枝就是在生成决策树之后再进行剪枝,通常会从决策树的叶节点开始,逐层向上对每个节点进行评估.如果减掉这个节点子树,与保留该节点子树在分类上准确性差别不大或有所提升,那么就可以把该节点子树进行剪枝.方法是:用这个节点子树的叶子节点来替代该节点,类标记为这个节点子树中最频繁的那个类.
     </p>
     <h3 id="_4">
      节点选择
     </h3>
     <p>
      给出一个打篮球的数据集如下:
     </p>
     <table>
      <thead>
       <tr>
        <th>
         天气
        </th>
        <th align="center">
         温度
        </th>
        <th align="center">
         湿度
        </th>
        <th align="center">
         刮风
        </th>
        <th>
         是否打篮球
        </th>
       </tr>
      </thead>
      <tbody>
       <tr>
        <td>
         晴
        </td>
        <td align="center">
         高
        </td>
        <td align="center">
         中
        </td>
        <td align="center">
         否
        </td>
        <td>
         否
        </td>
       </tr>
       <tr>
        <td>
         晴
        </td>
        <td align="center">
         高
        </td>
        <td align="center">
         中
        </td>
        <td align="center">
         是
        </td>
        <td>
         否
        </td>
       </tr>
       <tr>
        <td>
         阴
        </td>
        <td align="center">
         高
        </td>
        <td align="center">
         高
        </td>
        <td align="center">
         否
        </td>
        <td>
         是
        </td>
       </tr>
       <tr>
        <td>
         小雨
        </td>
        <td align="center">
         高
        </td>
        <td align="center">
         高
        </td>
        <td align="center">
         否
        </td>
        <td>
         是
        </td>
       </tr>
       <tr>
        <td>
         小雨
        </td>
        <td align="center">
         低
        </td>
        <td align="center">
         高
        </td>
        <td align="center">
         否
        </td>
        <td>
         是
        </td>
       </tr>
       <tr>
        <td>
         晴天
        </td>
        <td align="center">
         中
        </td>
        <td align="center">
         中
        </td>
        <td align="center">
         是
        </td>
        <td>
         是
        </td>
       </tr>
       <tr>
        <td>
         阴天
        </td>
        <td align="center">
         中
        </td>
        <td align="center">
         高
        </td>
        <td align="center">
         是
        </td>
        <td>
         否
        </td>
       </tr>
      </tbody>
     </table>
     <p>
      在这个探讨将那个属性(天气,温度,湿度,刮风)作为根节点的关键问题,这里是根据纯度和信息熵进行选择的,在纯度和信息熵之间有关联关系:纯度越低,信息熵越大;纯度越高,信息熵越小.
     </p>
     <p>
      关于的纯度和信息熵的具体信息和数学公式在极客时间-数据分析实战45讲-17讲里面有,这里不叙述了,数学公式之类太繁琐,可能会说不清楚,极客时间里面讲的还行,需要的可以去看看.
     </p>
     <p>
      在构造决策树时会基于纯度来构建,而经典的"不纯度"的指标有三种,分别是信息增益(ID3算法),信息增益率(C4.5算法)以及基尼指数(Cart算法),这篇文件介绍ID3算法.
     </p>
     <p>
      ID3算法计算的是信息增益,信息增益指的是划分可以带来纯度的提高,信息熵的下降.它的计算公式是父亲节点的信息熵减去所有子节点的信息熵.在计算的过程中,会计算每个子节点的归一化信息熵,即按照每个子节点在父节点中出现的概率,来计算这些子节点的信息熵.
     </p>
     <p>
      在构造的过程中需要计算两个东西:信息熵,信息增益.下面以打篮球的数据集为例如何计算:
     </p>
     <h4 id="_5">
      信息熵
     </h4>
     <h5 id="_6">
      根节点信息熵
     </h5>
     <p>
      信息熵的计算公式如下:
     </p>
     <p>
      $$Ent(D) = - \sum p_k \log_2 p_k$$
     </p>
     <p>
      其中的累加次数为结果数量,这个就是2(去打篮球和不去打篮球).$p_k$表示的是每个累加情况的概率,在上面的数据集中有7条数据,3个打篮球,4个不打篮球,那么不去打篮球的$p_k$为$\frac{4}{7}$,去打篮球为 $\frac{3}{7}$.则在根节点为空的情况下,根节点信息熵为:
     </p>
     <p>
      $$Ent(D) = - \sum p_k \log_2 p_k = - (\frac{4}{7} \log_2 \frac{4}{7} + \frac{3}{7} \log_2 \frac{3}{7}) = 0.985$$
     </p>
     <h5 id="_7">
      归一化信息熵
     </h5>
     <p>
      如果将天气作为属性划分,分别会有三个叶节点D1(晴天),D2(阴天),D3(小雨),用+代表去打篮球,-代表不去打篮球,+-符号前面是数据序号,那D1,D2,D3可以表示如下:
     </p>
     <ul>
      <li>
       D1(天气=晴天)={1-,2-,6+}
      </li>
      <li>
       D2(天气=阴天)={3+,7-}
      </li>
      <li>
       D3(天气=小雨)={4+,5-}
      </li>
     </ul>
     <p>
      三个叶节点的信息熵为:
     </p>
     <p>
      $$Ent(D1) = - (\frac{1}{3} log_2 \frac{1}{3} + \frac{2}{3} log_2 \frac{2}{3}) = 0.918$$
$$Ent(D2) = - (\frac{1}{2} log_2 \frac{1}{2} + \frac{1}{2} log_2 \frac{1}{2}) = 1.0$$
$$Ent(D3) = - (\frac{1}{2} log_2 \frac{1}{2} + \frac{1}{2} log_2 \frac{1}{2}) = 1.0$$
     </p>
     <p>
      D为天气,D1有3个记录,D2有2个记录,D3有2个记录,一个7条记录.D1在D中的概率为3/7,D2在D中的概率为2/7,D3在D中的概率为2/7.那么作为子节点的归一化信息熵为:$3/7
      <em>
       0.918+2/7
      </em>
      1.0+2/7*1.0=0.965$
     </p>
     <h4 id="_8">
      信息增益
     </h4>
     <p>
      在上面的信息熵的结果上可以得到天气作为属性节点的信息增益为根节点信息熵-归一化信息熵:$Gain(D,天气)=0.985-0.965=0.020$
     </p>
     <h4 id="_9">
      根据信息熵和信息增益选择节点
     </h4>
     <p>
      通过上面的例子,可以得到下面的节点的信息增益:
     </p>
     <ul>
      <li>
       Gain(D,天气)=0.020
      </li>
      <li>
       Gain(D,温度)=0.128
      </li>
      <li>
       Gain(D,湿度)=0.020
      </li>
      <li>
       Gain(D,刮风)=0.020
      </li>
     </ul>
     <p>
      可以看出这里的温度的信息增益最大,所有温度作为整个决策树的根节点.后面根据温度的高中低会生成三个中间节点,而这三个中间节点的选择算法同根节点,如果只有一种结果那这个节点就是叶节点.这样就生成了一颗决策树.
     </p>
     <h4 id="_10">
      信息增益率
     </h4>
     <p>
      *在文章中这部分好像描述的比较少,自己读起来也是比较懵,在网上找了一篇讲的不错的作为参考:
      <a href="https://blog.csdn.net/zjsghww/article/details/51638126">
       C4.5算法详解
      </a>
     </p>
     <p>
      信息增益率是C4.5算法对ID3改进使用的,它由信息增益和属性熵获取(属性分裂信息度量)
     </p>
     <h5 id="_11">
      属性熵(属性分裂信息度量)
     </h5>
     <p>
      上栗子吧,就不搞公式了,防止我理解错误,上面表格中各个属性熵为:
     </p>
     <p>
      H(天气)= -(3/7 * log2(3/7) + 2/7 * log2(2/7) + 2/7 * log2(2/7)) 天气有晴阴雨,各占3/7,2/7,2/7
H(温度)= -(4/7 * log2(4/7) + 2/7 * log2(2/7) + 1/7 * log2(1/7)) 温度有高中低
H(湿度)= -(4/7 * log2(4/7) + 3/7 * log2(3/7)) 湿度有两种情况
H(刮风)= -(4/7 * log2(4/7) + 3/7 * log2(3/7)) 刮风有两种情况
     </p>
     <h2 id="id3c45python3">
      ID3和C4.5算法实现(Python3)
     </h2>
     <div class="codehilite">
      <pre><span></span><code><span class="c1"># -*- coding: utf-8 -*-</span>
<span class="kn">from</span> <span class="nn">copy</span> <span class="kn">import</span> <span class="n">copy</span>

<span class="kn">import</span> <span class="nn">math</span>


<span class="k">def</span> <span class="nf">calculateInfoEntropy</span><span class="p">(</span><span class="n">trainData</span><span class="p">,</span> <span class="n">attributeIndex</span><span class="p">):</span>
    <span class="sd">"""</span>

<span class="sd">    :param trainData:</span>
<span class="sd">    :param attributeIndex:</span>
<span class="sd">    :return:</span>
<span class="sd">    {</span>
<span class="sd">        status: {</span>
<span class="sd">            count: value,</span>
<span class="sd">            value: value,</span>
<span class="sd">        },</span>
<span class="sd">        ......</span>
<span class="sd">    }</span>
<span class="sd">    """</span>
    <span class="n">statusStatistics</span> <span class="o">=</span> <span class="p">{}</span>
    <span class="k">for</span> <span class="n">item</span> <span class="ow">in</span> <span class="n">trainData</span><span class="p">:</span>
        <span class="k">if</span> <span class="n">item</span><span class="p">[</span><span class="n">attributeIndex</span><span class="p">]</span> <span class="ow">not</span> <span class="ow">in</span> <span class="n">statusStatistics</span><span class="p">:</span>
            <span class="n">statusStatistics</span><span class="p">[</span><span class="n">item</span><span class="p">[</span><span class="n">attributeIndex</span><span class="p">]]</span> <span class="o">=</span> <span class="p">{}</span>
        <span class="k">if</span> <span class="n">trainData</span><span class="p">[</span><span class="n">item</span><span class="p">]</span> <span class="ow">not</span> <span class="ow">in</span> <span class="n">statusStatistics</span><span class="p">[</span><span class="n">item</span><span class="p">[</span><span class="n">attributeIndex</span><span class="p">]]:</span>
            <span class="n">statusStatistics</span><span class="p">[</span><span class="n">item</span><span class="p">[</span><span class="n">attributeIndex</span><span class="p">]][</span><span class="n">trainData</span><span class="p">[</span><span class="n">item</span><span class="p">]]</span> <span class="o">=</span> <span class="mi">1</span>
        <span class="k">else</span><span class="p">:</span>
            <span class="n">statusStatistics</span><span class="p">[</span><span class="n">item</span><span class="p">[</span><span class="n">attributeIndex</span><span class="p">]][</span><span class="n">trainData</span><span class="p">[</span><span class="n">item</span><span class="p">]]</span> <span class="o">=</span> <span class="n">statusStatistics</span><span class="p">[</span><span class="n">item</span><span class="p">[</span><span class="n">attributeIndex</span><span class="p">]][</span><span class="n">trainData</span><span class="p">[</span><span class="n">item</span><span class="p">]]</span> <span class="o">+</span> <span class="mi">1</span>

    <span class="n">infoEntropyMap</span> <span class="o">=</span> <span class="p">{}</span>
    <span class="k">for</span> <span class="n">status</span> <span class="ow">in</span> <span class="n">statusStatistics</span><span class="p">:</span>
        <span class="n">amount</span> <span class="o">=</span> <span class="mi">0</span>
        <span class="k">for</span> <span class="n">result</span> <span class="ow">in</span> <span class="n">statusStatistics</span><span class="p">[</span><span class="n">status</span><span class="p">]:</span>
            <span class="n">amount</span> <span class="o">=</span> <span class="n">amount</span> <span class="o">+</span> <span class="n">statusStatistics</span><span class="p">[</span><span class="n">status</span><span class="p">][</span><span class="n">result</span><span class="p">]</span>

        <span class="n">infoEntropy</span> <span class="o">=</span> <span class="mf">0.0</span>
        <span class="k">for</span> <span class="n">result</span> <span class="ow">in</span> <span class="n">statusStatistics</span><span class="p">[</span><span class="n">status</span><span class="p">]:</span>
            <span class="n">probability</span> <span class="o">=</span> <span class="n">statusStatistics</span><span class="p">[</span><span class="n">status</span><span class="p">][</span><span class="n">result</span><span class="p">]</span> <span class="o">/</span> <span class="n">amount</span>
            <span class="n">infoEntropy</span> <span class="o">=</span> <span class="n">infoEntropy</span> <span class="o">+</span> <span class="p">(</span><span class="n">probability</span> <span class="o">*</span> <span class="n">math</span><span class="o">.</span><span class="n">log</span><span class="p">(</span><span class="n">probability</span><span class="p">,</span> <span class="mi">2</span><span class="p">))</span>

        <span class="n">infoEntropyMap</span><span class="p">[</span><span class="n">status</span><span class="p">]</span> <span class="o">=</span> <span class="p">{}</span>
        <span class="n">infoEntropyMap</span><span class="p">[</span><span class="n">status</span><span class="p">][</span><span class="s2">"count"</span><span class="p">]</span> <span class="o">=</span> <span class="n">amount</span>
        <span class="n">infoEntropyMap</span><span class="p">[</span><span class="n">status</span><span class="p">][</span><span class="s2">"value"</span><span class="p">]</span> <span class="o">=</span> <span class="n">infoEntropy</span> <span class="o">*</span> <span class="o">-</span><span class="mi">1</span>
    <span class="k">return</span> <span class="n">infoEntropyMap</span>


<span class="k">def</span> <span class="nf">getInfoEntropy</span><span class="p">(</span><span class="n">trainData</span><span class="p">,</span> <span class="n">properties</span><span class="p">,</span> <span class="n">propertiesIndex</span><span class="p">):</span>
    <span class="sd">"""</span>

<span class="sd">    :param trainData:</span>
<span class="sd">    :param properties:</span>
<span class="sd">    :param propertiesIndex:</span>
<span class="sd">    :return:</span>
<span class="sd">    {</span>
<span class="sd">        attribute: {</span>
<span class="sd">            status: {</span>
<span class="sd">                count: value,</span>
<span class="sd">                value: value,</span>
<span class="sd">            },</span>
<span class="sd">            ......</span>
<span class="sd">        },</span>
<span class="sd">        ......</span>
<span class="sd">    }</span>
<span class="sd">    """</span>
    <span class="n">infoEntropy</span> <span class="o">=</span> <span class="p">{}</span>
    <span class="k">for</span> <span class="n">attribute</span> <span class="ow">in</span> <span class="n">properties</span><span class="p">:</span>
        <span class="n">infoEntropy</span><span class="p">[</span><span class="n">attribute</span><span class="p">]</span> <span class="o">=</span> <span class="n">calculateInfoEntropy</span><span class="p">(</span><span class="n">trainData</span><span class="p">,</span> <span class="n">propertiesIndex</span><span class="p">[</span><span class="n">attribute</span><span class="p">])</span>
    <span class="k">return</span> <span class="n">infoEntropy</span>


<span class="k">def</span> <span class="nf">calculateNormalizedInfoEntropy</span><span class="p">(</span><span class="n">attributeInfoEntropy</span><span class="p">):</span>
    <span class="n">amount</span> <span class="o">=</span> <span class="mi">0</span>
    <span class="k">for</span> <span class="n">status</span> <span class="ow">in</span> <span class="n">attributeInfoEntropy</span><span class="p">:</span>
        <span class="n">amount</span> <span class="o">=</span> <span class="n">amount</span> <span class="o">+</span> <span class="n">attributeInfoEntropy</span><span class="p">[</span><span class="n">status</span><span class="p">][</span><span class="s2">"count"</span><span class="p">]</span>

    <span class="n">normalizedInfoEntropy</span> <span class="o">=</span> <span class="mf">0.0</span>
    <span class="k">for</span> <span class="n">status</span> <span class="ow">in</span> <span class="n">attributeInfoEntropy</span><span class="p">:</span>
        <span class="n">probability</span> <span class="o">=</span> <span class="n">attributeInfoEntropy</span><span class="p">[</span><span class="n">status</span><span class="p">][</span><span class="s2">"count"</span><span class="p">]</span> <span class="o">/</span> <span class="n">amount</span>
        <span class="n">normalizedInfoEntropy</span> <span class="o">=</span> <span class="n">normalizedInfoEntropy</span> <span class="o">+</span> <span class="n">probability</span> <span class="o">*</span> <span class="n">attributeInfoEntropy</span><span class="p">[</span><span class="n">status</span><span class="p">][</span><span class="s2">"value"</span><span class="p">]</span>
    <span class="k">return</span> <span class="n">normalizedInfoEntropy</span>


<span class="k">def</span> <span class="nf">getNormalizedInfoEntropy</span><span class="p">(</span><span class="n">infoEntropy</span><span class="p">):</span>
    <span class="n">normalizedInfoEntropy</span> <span class="o">=</span> <span class="p">{}</span>
    <span class="k">for</span> <span class="n">attribute</span> <span class="ow">in</span> <span class="n">infoEntropy</span><span class="p">:</span>
        <span class="n">normalizedInfoEntropy</span><span class="p">[</span><span class="n">attribute</span><span class="p">]</span> <span class="o">=</span> <span class="n">calculateNormalizedInfoEntropy</span><span class="p">(</span><span class="n">infoEntropy</span><span class="p">[</span><span class="n">attribute</span><span class="p">])</span>
    <span class="k">return</span> <span class="n">normalizedInfoEntropy</span>


<span class="k">def</span> <span class="nf">getAttributeEntropy</span><span class="p">(</span><span class="n">trainData</span><span class="p">,</span> <span class="n">properties</span><span class="p">,</span> <span class="n">propertiesIndex</span><span class="p">):</span>
    <span class="n">attributeEntropy</span> <span class="o">=</span> <span class="p">{}</span>
    <span class="k">for</span> <span class="n">attribute</span> <span class="ow">in</span> <span class="n">properties</span><span class="p">:</span>
        <span class="n">attributeEntropy</span><span class="p">[</span><span class="n">attribute</span><span class="p">]</span> <span class="o">=</span> <span class="mf">0.0</span>
        <span class="n">index</span> <span class="o">=</span> <span class="n">propertiesIndex</span><span class="p">[</span><span class="n">attribute</span><span class="p">]</span>
        <span class="n">attributeStatistics</span> <span class="o">=</span> <span class="p">{}</span>
        <span class="k">for</span> <span class="n">item</span> <span class="ow">in</span> <span class="n">trainData</span><span class="p">:</span>
            <span class="k">if</span> <span class="n">item</span><span class="p">[</span><span class="n">index</span><span class="p">]</span> <span class="ow">not</span> <span class="ow">in</span> <span class="n">attributeStatistics</span><span class="p">:</span>
                <span class="n">attributeStatistics</span><span class="p">[</span><span class="n">item</span><span class="p">[</span><span class="n">index</span><span class="p">]]</span> <span class="o">=</span> <span class="mi">1</span>
            <span class="k">else</span><span class="p">:</span>
                <span class="n">attributeStatistics</span><span class="p">[</span><span class="n">item</span><span class="p">[</span><span class="n">index</span><span class="p">]]</span> <span class="o">=</span> <span class="n">attributeStatistics</span><span class="p">[</span><span class="n">item</span><span class="p">[</span><span class="n">index</span><span class="p">]]</span> <span class="o">+</span> <span class="mi">1</span>

        <span class="k">for</span> <span class="n">item</span> <span class="ow">in</span> <span class="n">attributeStatistics</span><span class="p">:</span>
            <span class="n">probability</span> <span class="o">=</span> <span class="n">attributeStatistics</span><span class="p">[</span><span class="n">item</span><span class="p">]</span> <span class="o">/</span> <span class="nb">len</span><span class="p">(</span><span class="n">trainData</span><span class="p">)</span>
            <span class="n">attributeEntropy</span><span class="p">[</span><span class="n">attribute</span><span class="p">]</span> <span class="o">=</span> <span class="n">attributeEntropy</span><span class="p">[</span><span class="n">attribute</span><span class="p">]</span> <span class="o">+</span> <span class="n">probability</span> <span class="o">*</span> <span class="n">math</span><span class="o">.</span><span class="n">log</span><span class="p">(</span><span class="n">probability</span><span class="p">,</span> <span class="mi">2</span><span class="p">)</span>
        <span class="n">attributeEntropy</span><span class="p">[</span><span class="n">attribute</span><span class="p">]</span> <span class="o">=</span> <span class="n">attributeEntropy</span><span class="p">[</span><span class="n">attribute</span><span class="p">]</span> <span class="o">*</span> <span class="o">-</span><span class="mi">1</span>
    <span class="k">return</span> <span class="n">attributeEntropy</span>


<span class="k">def</span> <span class="nf">getInfoGainRate</span><span class="p">(</span><span class="n">infoGain</span><span class="p">,</span> <span class="n">attributeEntropy</span><span class="p">):</span>
    <span class="n">infoGainRate</span> <span class="o">=</span> <span class="p">{}</span>
    <span class="k">for</span> <span class="n">attribute</span> <span class="ow">in</span> <span class="n">attributeEntropy</span><span class="p">:</span>
        <span class="k">if</span> <span class="n">attributeEntropy</span><span class="p">[</span><span class="n">attribute</span><span class="p">]</span> <span class="o">==</span> <span class="mf">0.0</span><span class="p">:</span>
            <span class="k">continue</span>
        <span class="n">infoGainRate</span><span class="p">[</span><span class="n">attribute</span><span class="p">]</span> <span class="o">=</span> <span class="n">infoGain</span><span class="p">[</span><span class="n">attribute</span><span class="p">]</span> <span class="o">/</span> <span class="n">attributeEntropy</span><span class="p">[</span><span class="n">attribute</span><span class="p">]</span>
    <span class="k">return</span> <span class="n">infoGainRate</span>


<span class="k">def</span> <span class="nf">selectNode</span><span class="p">(</span><span class="n">trainData</span><span class="p">,</span> <span class="n">properties</span><span class="p">,</span> <span class="n">propertiesIndex</span><span class="p">,</span> <span class="n">model</span><span class="p">):</span>
    <span class="sd">"""</span>

<span class="sd">    :param trainData:</span>
<span class="sd">    :param properties:</span>
<span class="sd">    :param propertiesIndex:</span>
<span class="sd">    :return:</span>
<span class="sd">    node(当前选择的节点属性信息):{</span>
<span class="sd">        name: 节点属性名称</span>
<span class="sd">        status: { 各个状态的信息熵</span>
<span class="sd">            key: {</span>
<span class="sd">                keys: []</span>
<span class="sd">                value: value</span>
<span class="sd">            }</span>
<span class="sd">        }</span>
<span class="sd">    }</span>
<span class="sd">    """</span>
    <span class="nb">print</span><span class="p">(</span><span class="s2">"</span><span class="se">\n</span><span class="s2">当前训练集为:"</span><span class="p">,</span> <span class="n">trainData</span><span class="p">)</span>
    <span class="nb">print</span><span class="p">(</span><span class="s2">"当前的属性列表为:"</span><span class="p">,</span> <span class="n">properties</span><span class="p">)</span>
    <span class="n">resultStatistics</span> <span class="o">=</span> <span class="p">{}</span>
    <span class="k">for</span> <span class="n">item</span> <span class="ow">in</span> <span class="n">trainData</span><span class="p">:</span>
        <span class="k">if</span> <span class="n">trainData</span><span class="p">[</span><span class="n">item</span><span class="p">]</span> <span class="ow">not</span> <span class="ow">in</span> <span class="n">resultStatistics</span><span class="p">:</span>
            <span class="n">resultStatistics</span><span class="p">[</span><span class="n">trainData</span><span class="p">[</span><span class="n">item</span><span class="p">]]</span> <span class="o">=</span> <span class="mi">1</span>
        <span class="k">else</span><span class="p">:</span>
            <span class="n">resultStatistics</span><span class="p">[</span><span class="n">trainData</span><span class="p">[</span><span class="n">item</span><span class="p">]]</span> <span class="o">=</span> <span class="n">resultStatistics</span><span class="p">[</span><span class="n">trainData</span><span class="p">[</span><span class="n">item</span><span class="p">]]</span> <span class="o">+</span> <span class="mi">1</span>

    <span class="n">rootInfoEntropy</span> <span class="o">=</span> <span class="mf">0.0</span>
    <span class="k">for</span> <span class="n">result</span> <span class="ow">in</span> <span class="n">resultStatistics</span><span class="p">:</span>
        <span class="n">probability</span> <span class="o">=</span> <span class="n">resultStatistics</span><span class="p">[</span><span class="n">result</span><span class="p">]</span> <span class="o">/</span> <span class="nb">len</span><span class="p">(</span><span class="n">trainData</span><span class="p">)</span>
        <span class="n">rootInfoEntropy</span> <span class="o">=</span> <span class="n">rootInfoEntropy</span> <span class="o">+</span> <span class="p">(</span><span class="n">probability</span> <span class="o">*</span> <span class="n">math</span><span class="o">.</span><span class="n">log</span><span class="p">(</span><span class="n">probability</span><span class="p">,</span> <span class="mi">2</span><span class="p">))</span>
    <span class="n">rootInfoEntropy</span> <span class="o">=</span> <span class="n">rootInfoEntropy</span> <span class="o">*</span> <span class="o">-</span><span class="mi">1</span>
    <span class="nb">print</span><span class="p">(</span><span class="s2">"当前训练集的根节点的信息熵:"</span><span class="p">,</span> <span class="n">rootInfoEntropy</span><span class="p">)</span>

    <span class="n">infoEntropy</span> <span class="o">=</span> <span class="n">getInfoEntropy</span><span class="p">(</span><span class="n">trainData</span><span class="p">,</span> <span class="n">properties</span><span class="p">,</span> <span class="n">propertiesIndex</span><span class="p">)</span>
    <span class="nb">print</span><span class="p">(</span><span class="s2">"当前训练集的信息熵:"</span><span class="p">,</span> <span class="n">infoEntropy</span><span class="p">)</span>

    <span class="n">normalizedInfoEntropy</span> <span class="o">=</span> <span class="n">getNormalizedInfoEntropy</span><span class="p">(</span><span class="n">infoEntropy</span><span class="p">)</span>
    <span class="nb">print</span><span class="p">(</span><span class="s2">"当前训练集的归一化信息熵:"</span><span class="p">,</span> <span class="n">normalizedInfoEntropy</span><span class="p">)</span>

    <span class="n">infoGain</span> <span class="o">=</span> <span class="p">{}</span>
    <span class="k">for</span> <span class="n">attribute</span> <span class="ow">in</span> <span class="n">properties</span><span class="p">:</span>
        <span class="n">infoGain</span><span class="p">[</span><span class="n">attribute</span><span class="p">]</span> <span class="o">=</span> <span class="n">rootInfoEntropy</span> <span class="o">-</span> <span class="n">normalizedInfoEntropy</span><span class="p">[</span><span class="n">attribute</span><span class="p">]</span>
    <span class="nb">print</span><span class="p">(</span><span class="s2">"当前训练集的信息增益:"</span><span class="p">,</span> <span class="n">infoGain</span><span class="p">)</span>

    <span class="n">name</span> <span class="o">=</span> <span class="kc">None</span>
    <span class="k">for</span> <span class="n">attribute</span> <span class="ow">in</span> <span class="n">infoGain</span><span class="p">:</span>
        <span class="k">if</span> <span class="n">name</span> <span class="ow">is</span> <span class="kc">None</span><span class="p">:</span>
            <span class="n">name</span> <span class="o">=</span> <span class="n">attribute</span>
        <span class="k">elif</span> <span class="n">infoGain</span><span class="p">[</span><span class="n">attribute</span><span class="p">]</span> <span class="o">&gt;</span> <span class="n">infoGain</span><span class="p">[</span><span class="n">name</span><span class="p">]:</span>
            <span class="n">name</span> <span class="o">=</span> <span class="n">attribute</span>

    <span class="c1"># C4.5添加start</span>
    <span class="n">attributeEntropy</span> <span class="o">=</span> <span class="p">{}</span>
    <span class="n">infoGainRate</span> <span class="o">=</span> <span class="p">{}</span>
    <span class="k">if</span> <span class="n">model</span> <span class="o">==</span> <span class="mi">1</span><span class="p">:</span>
        <span class="n">attributeEntropy</span> <span class="o">=</span> <span class="n">getAttributeEntropy</span><span class="p">(</span><span class="n">trainData</span><span class="p">,</span> <span class="n">properties</span><span class="p">,</span> <span class="n">propertiesIndex</span><span class="p">)</span>
        <span class="nb">print</span><span class="p">(</span><span class="s2">"当前属性熵："</span><span class="p">,</span> <span class="n">attributeEntropy</span><span class="p">)</span>

        <span class="n">infoGainRate</span> <span class="o">=</span> <span class="n">getInfoGainRate</span><span class="p">(</span><span class="n">infoGain</span><span class="p">,</span> <span class="n">attributeEntropy</span><span class="p">)</span>
        <span class="nb">print</span><span class="p">(</span><span class="s2">"当前信息增益率:"</span><span class="p">,</span> <span class="n">infoGainRate</span><span class="p">)</span>

        <span class="n">name</span> <span class="o">=</span> <span class="kc">None</span>
        <span class="k">for</span> <span class="n">attribute</span> <span class="ow">in</span> <span class="n">infoGainRate</span><span class="p">:</span>
            <span class="k">if</span> <span class="n">name</span> <span class="ow">is</span> <span class="kc">None</span><span class="p">:</span>
                <span class="n">name</span> <span class="o">=</span> <span class="n">attribute</span>
            <span class="k">elif</span> <span class="n">infoGainRate</span><span class="p">[</span><span class="n">attribute</span><span class="p">]</span> <span class="o">&gt;</span> <span class="n">infoGainRate</span><span class="p">[</span><span class="n">name</span><span class="p">]:</span>
                <span class="n">name</span> <span class="o">=</span> <span class="n">attribute</span>

    <span class="n">statusMap</span> <span class="o">=</span> <span class="p">{}</span>
    <span class="k">for</span> <span class="n">status</span> <span class="ow">in</span> <span class="n">infoEntropy</span><span class="p">[</span><span class="n">name</span><span class="p">]:</span>
        <span class="n">statusMap</span><span class="p">[</span><span class="n">status</span><span class="p">]</span> <span class="o">=</span> <span class="p">{}</span>
        <span class="n">statusMap</span><span class="p">[</span><span class="n">status</span><span class="p">][</span><span class="s2">"value"</span><span class="p">]</span> <span class="o">=</span> <span class="n">infoEntropy</span><span class="p">[</span><span class="n">name</span><span class="p">][</span><span class="n">status</span><span class="p">][</span><span class="s2">"value"</span><span class="p">]</span>
        <span class="n">statusMap</span><span class="p">[</span><span class="n">status</span><span class="p">][</span><span class="s2">"keys"</span><span class="p">]</span> <span class="o">=</span> <span class="p">[]</span>
        <span class="k">for</span> <span class="n">item</span> <span class="ow">in</span> <span class="n">trainData</span><span class="p">:</span>
            <span class="k">if</span> <span class="n">item</span><span class="p">[</span><span class="n">propertiesIndex</span><span class="p">[</span><span class="n">name</span><span class="p">]]</span> <span class="o">==</span> <span class="n">status</span><span class="p">:</span>
                <span class="n">statusMap</span><span class="p">[</span><span class="n">status</span><span class="p">][</span><span class="s2">"keys"</span><span class="p">]</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">item</span><span class="p">)</span>

    <span class="n">node</span> <span class="o">=</span> <span class="p">{</span>
        <span class="s2">"name"</span><span class="p">:</span> <span class="n">name</span><span class="p">,</span>
        <span class="s2">"status"</span><span class="p">:</span> <span class="n">statusMap</span><span class="p">,</span>
    <span class="p">}</span>
    <span class="k">return</span> <span class="n">node</span>


<span class="k">def</span> <span class="nf">getID3Tree</span><span class="p">(</span><span class="n">trainData</span><span class="p">,</span> <span class="n">properties</span><span class="p">,</span> <span class="n">propertiesIndex</span><span class="p">,</span> <span class="n">model</span><span class="p">):</span>
    <span class="sd">"""</span>
<span class="sd">    获取ID3决策树</span>
<span class="sd">    :param trainData:</span>
<span class="sd">    :param properties:</span>
<span class="sd">    :param propertiesIndex:</span>
<span class="sd">    :return:</span>

<span class="sd">    node(当前选择的节点属性信息):{</span>
<span class="sd">        name: 节点属性名称</span>
<span class="sd">        status: { 各个状态的信息熵</span>
<span class="sd">            key: {</span>
<span class="sd">                keys: []</span>
<span class="sd">                value: value</span>
<span class="sd">            }</span>
<span class="sd">        }</span>
<span class="sd">    }</span>
<span class="sd">    """</span>
    <span class="n">node</span> <span class="o">=</span> <span class="n">selectNode</span><span class="p">(</span><span class="n">trainData</span><span class="p">,</span> <span class="n">properties</span><span class="p">,</span> <span class="n">propertiesIndex</span><span class="p">,</span> <span class="n">model</span><span class="p">)</span>
    <span class="nb">print</span><span class="p">(</span><span class="s2">"当前选择的最优节点为:"</span><span class="p">,</span> <span class="n">node</span><span class="p">)</span>
    <span class="n">tree</span> <span class="o">=</span> <span class="p">{</span>
        <span class="n">node</span><span class="p">[</span><span class="s2">"name"</span><span class="p">]:</span> <span class="p">{},</span>
    <span class="p">}</span>

    <span class="n">tempProperties</span> <span class="o">=</span> <span class="n">copy</span><span class="p">(</span><span class="n">properties</span><span class="p">)</span>
    <span class="nb">print</span><span class="p">(</span><span class="s2">"进行属性列表裁剪:"</span><span class="p">,</span> <span class="n">tempProperties</span><span class="p">,</span> <span class="n">node</span><span class="p">[</span><span class="s2">"name"</span><span class="p">])</span>
    <span class="k">if</span> <span class="n">node</span><span class="p">[</span><span class="s2">"name"</span><span class="p">]</span> <span class="ow">in</span> <span class="n">tempProperties</span><span class="p">:</span>
        <span class="n">tempProperties</span><span class="o">.</span><span class="n">remove</span><span class="p">(</span><span class="n">node</span><span class="p">[</span><span class="s2">"name"</span><span class="p">])</span>

    <span class="k">for</span> <span class="n">status</span> <span class="ow">in</span> <span class="n">node</span><span class="p">[</span><span class="s2">"status"</span><span class="p">]:</span>
        <span class="k">if</span> <span class="n">node</span><span class="p">[</span><span class="s2">"status"</span><span class="p">][</span><span class="n">status</span><span class="p">][</span><span class="s2">"value"</span><span class="p">]</span> <span class="o">==</span> <span class="mf">0.0</span><span class="p">:</span>
            <span class="n">tree</span><span class="p">[</span><span class="n">node</span><span class="p">[</span><span class="s2">"name"</span><span class="p">]][</span><span class="n">status</span><span class="p">]</span> <span class="o">=</span> <span class="n">trainData</span><span class="p">[</span><span class="n">node</span><span class="p">[</span><span class="s2">"status"</span><span class="p">][</span><span class="n">status</span><span class="p">][</span><span class="s2">"keys"</span><span class="p">][</span><span class="mi">0</span><span class="p">]]</span>
        <span class="k">else</span><span class="p">:</span>
            <span class="n">tempTrainData</span> <span class="o">=</span> <span class="p">{}</span>
            <span class="k">for</span> <span class="n">item</span> <span class="ow">in</span> <span class="n">trainData</span><span class="p">:</span>
                <span class="k">if</span> <span class="n">item</span><span class="p">[</span><span class="n">propertiesIndex</span><span class="p">[</span><span class="n">node</span><span class="p">[</span><span class="s2">"name"</span><span class="p">]]]</span> <span class="o">==</span> <span class="n">status</span><span class="p">:</span>
                    <span class="n">tempTrainData</span><span class="p">[</span><span class="n">item</span><span class="p">]</span> <span class="o">=</span> <span class="n">trainData</span><span class="p">[</span><span class="n">item</span><span class="p">]</span>

            <span class="n">tree</span><span class="p">[</span><span class="n">node</span><span class="p">[</span><span class="s2">"name"</span><span class="p">]][</span><span class="n">status</span><span class="p">],</span> <span class="n">nodeName</span> <span class="o">=</span> <span class="n">getID3Tree</span><span class="p">(</span><span class="n">tempTrainData</span><span class="p">,</span> <span class="n">tempProperties</span><span class="p">,</span> <span class="n">propertiesIndex</span><span class="p">,</span> <span class="n">model</span><span class="p">)</span>
            <span class="nb">print</span><span class="p">(</span><span class="s2">"进行属性列表裁剪:"</span><span class="p">,</span> <span class="n">tempProperties</span><span class="p">,</span> <span class="n">nodeName</span><span class="p">)</span>
            <span class="k">if</span> <span class="n">nodeName</span> <span class="ow">in</span> <span class="n">tempProperties</span><span class="p">:</span>
                <span class="n">tempProperties</span><span class="o">.</span><span class="n">remove</span><span class="p">(</span><span class="n">nodeName</span><span class="p">)</span>

    <span class="k">return</span> <span class="n">tree</span><span class="p">,</span> <span class="n">node</span><span class="p">[</span><span class="s2">"name"</span><span class="p">]</span>


<span class="k">def</span> <span class="nf">printTree</span><span class="p">(</span><span class="n">tree</span><span class="p">,</span> <span class="n">interval</span><span class="p">):</span>
    <span class="n">newInterval</span> <span class="o">=</span> <span class="n">interval</span> <span class="o">+</span> <span class="s2">"</span><span class="se">\t</span><span class="s2">"</span>
    <span class="k">for</span> <span class="n">item</span> <span class="ow">in</span> <span class="n">tree</span><span class="p">:</span>
        <span class="k">if</span> <span class="nb">type</span><span class="p">(</span><span class="n">tree</span><span class="p">[</span><span class="n">item</span><span class="p">])</span> <span class="ow">is</span> <span class="ow">not</span> <span class="nb">dict</span><span class="p">:</span>
            <span class="nb">print</span><span class="p">(</span><span class="n">interval</span> <span class="o">+</span> <span class="n">item</span><span class="p">,</span> <span class="n">tree</span><span class="p">[</span><span class="n">item</span><span class="p">])</span>
        <span class="k">else</span><span class="p">:</span>
            <span class="nb">print</span><span class="p">(</span><span class="n">interval</span> <span class="o">+</span> <span class="n">item</span><span class="p">)</span>
            <span class="n">printTree</span><span class="p">(</span><span class="n">tree</span><span class="p">[</span><span class="n">item</span><span class="p">],</span> <span class="n">newInterval</span><span class="p">)</span>


<span class="k">if</span> <span class="vm">__name__</span> <span class="o">==</span> <span class="s2">"__main__"</span><span class="p">:</span>
    <span class="c1"># 设置算法模式: 0-&gt;ID3 1-&gt;C4.5</span>
    <span class="n">model</span> <span class="o">=</span> <span class="mi">0</span>

    <span class="n">properties</span> <span class="o">=</span> <span class="p">[</span><span class="s2">"weather"</span><span class="p">,</span> <span class="s2">"temperature"</span><span class="p">,</span> <span class="s2">"humidity"</span><span class="p">,</span> <span class="s2">"windy"</span><span class="p">]</span>
    <span class="n">propertiesIndex</span> <span class="o">=</span> <span class="p">{</span><span class="s2">"weather"</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span> <span class="s2">"temperature"</span><span class="p">:</span> <span class="mi">1</span><span class="p">,</span> <span class="s2">"humidity"</span><span class="p">:</span> <span class="mi">2</span><span class="p">,</span> <span class="s2">"windy"</span><span class="p">:</span> <span class="mi">3</span><span class="p">}</span>
    <span class="c1"># 0:不打篮球 1:打篮球</span>
    <span class="n">trainData</span> <span class="o">=</span> <span class="p">{</span>
        <span class="p">(</span><span class="s2">"sun"</span><span class="p">,</span> <span class="s2">"high"</span><span class="p">,</span> <span class="s2">"middle"</span><span class="p">,</span> <span class="s2">"no"</span><span class="p">):</span> <span class="mi">0</span><span class="p">,</span>
        <span class="p">(</span><span class="s2">"sun"</span><span class="p">,</span> <span class="s2">"high"</span><span class="p">,</span> <span class="s2">"middle"</span><span class="p">,</span> <span class="s2">"yes"</span><span class="p">):</span> <span class="mi">0</span><span class="p">,</span>
        <span class="p">(</span><span class="s2">"cloud"</span><span class="p">,</span> <span class="s2">"high"</span><span class="p">,</span> <span class="s2">"high"</span><span class="p">,</span> <span class="s2">"no"</span><span class="p">):</span> <span class="mi">1</span><span class="p">,</span>
        <span class="p">(</span><span class="s2">"rain"</span><span class="p">,</span> <span class="s2">"high"</span><span class="p">,</span> <span class="s2">"high"</span><span class="p">,</span> <span class="s2">"no"</span><span class="p">):</span> <span class="mi">1</span><span class="p">,</span>
        <span class="p">(</span><span class="s2">"rain"</span><span class="p">,</span> <span class="s2">"low"</span><span class="p">,</span> <span class="s2">"high"</span><span class="p">,</span> <span class="s2">"no"</span><span class="p">):</span> <span class="mi">0</span><span class="p">,</span>
        <span class="p">(</span><span class="s2">"sun"</span><span class="p">,</span> <span class="s2">"middle"</span><span class="p">,</span> <span class="s2">"middle"</span><span class="p">,</span> <span class="s2">"yes"</span><span class="p">):</span> <span class="mi">1</span><span class="p">,</span>
        <span class="p">(</span><span class="s2">"cloud"</span><span class="p">,</span> <span class="s2">"middle"</span><span class="p">,</span> <span class="s2">"high"</span><span class="p">,</span> <span class="s2">"yes"</span><span class="p">):</span> <span class="mi">0</span><span class="p">,</span>
    <span class="p">}</span>
    <span class="c1"># for key in trainData:</span>
    <span class="c1">#     print(key, trainData[key])</span>

    <span class="n">tree</span><span class="p">,</span> <span class="n">name</span> <span class="o">=</span> <span class="n">getID3Tree</span><span class="p">(</span><span class="n">trainData</span><span class="p">,</span> <span class="n">properties</span><span class="p">,</span> <span class="n">propertiesIndex</span><span class="p">,</span> <span class="n">model</span><span class="p">)</span>
    <span class="nb">print</span><span class="p">(</span><span class="nb">str</span><span class="p">(</span><span class="n">tree</span><span class="p">)</span><span class="o">.</span><span class="n">replace</span><span class="p">(</span><span class="s2">"'"</span><span class="p">,</span> <span class="s1">'"'</span><span class="p">))</span>
    <span class="n">printTree</span><span class="p">(</span><span class="n">tree</span><span class="p">,</span> <span class="s2">""</span><span class="p">)</span>
</code></pre>
     </div>
     <p>
      运行结果为:
     </p>
     <div class="codehilite">
      <pre><span></span><code><span class="err">当前训练集为</span><span class="p">:</span> <span class="err">{</span><span class="p">(</span><span class="s1">'sun'</span><span class="p">,</span> <span class="s1">'high'</span><span class="p">,</span> <span class="s1">'middle'</span><span class="p">,</span> <span class="s1">'no'</span><span class="p">):</span> <span class="mi">0</span><span class="p">,</span> <span class="p">(</span><span class="s1">'sun'</span><span class="p">,</span> <span class="s1">'high'</span><span class="p">,</span> <span class="s1">'middle'</span><span class="p">,</span> <span class="s1">'yes'</span><span class="p">):</span> <span class="mi">0</span><span class="p">,</span> <span class="p">(</span><span class="s1">'cloud'</span><span class="p">,</span> <span class="s1">'high'</span><span class="p">,</span> <span class="s1">'high'</span><span class="p">,</span> <span class="s1">'no'</span><span class="p">):</span> <span class="mi">1</span><span class="p">,</span> <span class="p">(</span><span class="s1">'rain'</span><span class="p">,</span> <span class="s1">'high'</span><span class="p">,</span> <span class="s1">'high'</span><span class="p">,</span> <span class="s1">'no'</span><span class="p">):</span> <span class="mi">1</span><span class="p">,</span> <span class="p">(</span><span class="s1">'rain'</span><span class="p">,</span> <span class="s1">'low'</span><span class="p">,</span> <span class="s1">'high'</span><span class="p">,</span> <span class="s1">'no'</span><span class="p">):</span> <span class="mi">0</span><span class="p">,</span> <span class="p">(</span><span class="s1">'sun'</span><span class="p">,</span> <span class="s1">'middle'</span><span class="p">,</span> <span class="s1">'middle'</span><span class="p">,</span> <span class="s1">'yes'</span><span class="p">):</span> <span class="mi">1</span><span class="p">,</span> <span class="p">(</span><span class="s1">'cloud'</span><span class="p">,</span> <span class="s1">'middle'</span><span class="p">,</span> <span class="s1">'high'</span><span class="p">,</span> <span class="s1">'yes'</span><span class="p">):</span> <span class="mi">0</span><span class="err">}</span>
<span class="err">当前的属性列表为</span><span class="p">:</span> <span class="p">[</span><span class="s1">'weather'</span><span class="p">,</span> <span class="s1">'temperature'</span><span class="p">,</span> <span class="s1">'humidity'</span><span class="p">,</span> <span class="s1">'windy'</span><span class="p">]</span>
<span class="err">当前训练集的根节点的信息熵</span><span class="p">:</span> <span class="mi">0</span><span class="p">.</span><span class="mi">9852281360342516</span>
<span class="err">当前训练集的信息熵</span><span class="p">:</span> <span class="err">{</span><span class="s1">'weather'</span><span class="p">:</span> <span class="err">{</span><span class="s1">'sun'</span><span class="p">:</span> <span class="err">{</span><span class="s1">'count'</span><span class="p">:</span> <span class="mi">3</span><span class="p">,</span> <span class="s1">'value'</span><span class="p">:</span> <span class="mi">0</span><span class="p">.</span><span class="mi">9182958340544896</span><span class="err">}</span><span class="p">,</span> <span class="s1">'cloud'</span><span class="p">:</span> <span class="err">{</span><span class="s1">'count'</span><span class="p">:</span> <span class="mi">2</span><span class="p">,</span> <span class="s1">'value'</span><span class="p">:</span> <span class="mi">1</span><span class="p">.</span><span class="mi">0</span><span class="err">}</span><span class="p">,</span> <span class="s1">'rain'</span><span class="p">:</span> <span class="err">{</span><span class="s1">'count'</span><span class="p">:</span> <span class="mi">2</span><span class="p">,</span> <span class="s1">'value'</span><span class="p">:</span> <span class="mi">1</span><span class="p">.</span><span class="mi">0</span><span class="err">}}</span><span class="p">,</span> <span class="s1">'temperature'</span><span class="p">:</span> <span class="err">{</span><span class="s1">'high'</span><span class="p">:</span> <span class="err">{</span><span class="s1">'count'</span><span class="p">:</span> <span class="mi">4</span><span class="p">,</span> <span class="s1">'value'</span><span class="p">:</span> <span class="mi">1</span><span class="p">.</span><span class="mi">0</span><span class="err">}</span><span class="p">,</span> <span class="s1">'low'</span><span class="p">:</span> <span class="err">{</span><span class="s1">'count'</span><span class="p">:</span> <span class="mi">1</span><span class="p">,</span> <span class="s1">'value'</span><span class="p">:</span> <span class="o">-</span><span class="mi">0</span><span class="p">.</span><span class="mi">0</span><span class="err">}</span><span class="p">,</span> <span class="s1">'middle'</span><span class="p">:</span> <span class="err">{</span><span class="s1">'count'</span><span class="p">:</span> <span class="mi">2</span><span class="p">,</span> <span class="s1">'value'</span><span class="p">:</span> <span class="mi">1</span><span class="p">.</span><span class="mi">0</span><span class="err">}}</span><span class="p">,</span> <span class="s1">'humidity'</span><span class="p">:</span> <span class="err">{</span><span class="s1">'middle'</span><span class="p">:</span> <span class="err">{</span><span class="s1">'count'</span><span class="p">:</span> <span class="mi">3</span><span class="p">,</span> <span class="s1">'value'</span><span class="p">:</span> <span class="mi">0</span><span class="p">.</span><span class="mi">9182958340544896</span><span class="err">}</span><span class="p">,</span> <span class="s1">'high'</span><span class="p">:</span> <span class="err">{</span><span class="s1">'count'</span><span class="p">:</span> <span class="mi">4</span><span class="p">,</span> <span class="s1">'value'</span><span class="p">:</span> <span class="mi">1</span><span class="p">.</span><span class="mi">0</span><span class="err">}}</span><span class="p">,</span> <span class="s1">'windy'</span><span class="p">:</span> <span class="err">{</span><span class="s1">'no'</span><span class="p">:</span> <span class="err">{</span><span class="s1">'count'</span><span class="p">:</span> <span class="mi">4</span><span class="p">,</span> <span class="s1">'value'</span><span class="p">:</span> <span class="mi">1</span><span class="p">.</span><span class="mi">0</span><span class="err">}</span><span class="p">,</span> <span class="s1">'yes'</span><span class="p">:</span> <span class="err">{</span><span class="s1">'count'</span><span class="p">:</span> <span class="mi">3</span><span class="p">,</span> <span class="s1">'value'</span><span class="p">:</span> <span class="mi">0</span><span class="p">.</span><span class="mi">9182958340544896</span><span class="err">}}}</span>
<span class="err">当前训练集的归一化信息熵</span><span class="p">:</span> <span class="err">{</span><span class="s1">'weather'</span><span class="p">:</span> <span class="mi">0</span><span class="p">.</span><span class="mi">9649839288804954</span><span class="p">,</span> <span class="s1">'temperature'</span><span class="p">:</span> <span class="mi">0</span><span class="p">.</span><span class="mi">8571428571428571</span><span class="p">,</span> <span class="s1">'humidity'</span><span class="p">:</span> <span class="mi">0</span><span class="p">.</span><span class="mi">9649839288804954</span><span class="p">,</span> <span class="s1">'windy'</span><span class="p">:</span> <span class="mi">0</span><span class="p">.</span><span class="mi">9649839288804954</span><span class="err">}</span>
<span class="err">当前训练集的信息增益</span><span class="p">:</span> <span class="err">{</span><span class="s1">'weather'</span><span class="p">:</span> <span class="mi">0</span><span class="p">.</span><span class="mi">02024420715375619</span><span class="p">,</span> <span class="s1">'temperature'</span><span class="p">:</span> <span class="mi">0</span><span class="p">.</span><span class="mi">12808527889139454</span><span class="p">,</span> <span class="s1">'humidity'</span><span class="p">:</span> <span class="mi">0</span><span class="p">.</span><span class="mi">02024420715375619</span><span class="p">,</span> <span class="s1">'windy'</span><span class="p">:</span> <span class="mi">0</span><span class="p">.</span><span class="mi">02024420715375619</span><span class="err">}</span>
<span class="err">当前选择的最优节点为</span><span class="p">:</span> <span class="err">{</span><span class="s1">'name'</span><span class="p">:</span> <span class="s1">'temperature'</span><span class="p">,</span> <span class="s1">'status'</span><span class="p">:</span> <span class="err">{</span><span class="s1">'high'</span><span class="p">:</span> <span class="err">{</span><span class="s1">'value'</span><span class="p">:</span> <span class="mi">1</span><span class="p">.</span><span class="mi">0</span><span class="p">,</span> <span class="s1">'keys'</span><span class="p">:</span> <span class="p">[(</span><span class="s1">'sun'</span><span class="p">,</span> <span class="s1">'high'</span><span class="p">,</span> <span class="s1">'middle'</span><span class="p">,</span> <span class="s1">'no'</span><span class="p">),</span> <span class="p">(</span><span class="s1">'sun'</span><span class="p">,</span> <span class="s1">'high'</span><span class="p">,</span> <span class="s1">'middle'</span><span class="p">,</span> <span class="s1">'yes'</span><span class="p">),</span> <span class="p">(</span><span class="s1">'cloud'</span><span class="p">,</span> <span class="s1">'high'</span><span class="p">,</span> <span class="s1">'high'</span><span class="p">,</span> <span class="s1">'no'</span><span class="p">),</span> <span class="p">(</span><span class="s1">'rain'</span><span class="p">,</span> <span class="s1">'high'</span><span class="p">,</span> <span class="s1">'high'</span><span class="p">,</span> <span class="s1">'no'</span><span class="p">)]</span><span class="err">}</span><span class="p">,</span> <span class="s1">'low'</span><span class="p">:</span> <span class="err">{</span><span class="s1">'value'</span><span class="p">:</span> <span class="o">-</span><span class="mi">0</span><span class="p">.</span><span class="mi">0</span><span class="p">,</span> <span class="s1">'keys'</span><span class="p">:</span> <span class="p">[(</span><span class="s1">'rain'</span><span class="p">,</span> <span class="s1">'low'</span><span class="p">,</span> <span class="s1">'high'</span><span class="p">,</span> <span class="s1">'no'</span><span class="p">)]</span><span class="err">}</span><span class="p">,</span> <span class="s1">'middle'</span><span class="p">:</span> <span class="err">{</span><span class="s1">'value'</span><span class="p">:</span> <span class="mi">1</span><span class="p">.</span><span class="mi">0</span><span class="p">,</span> <span class="s1">'keys'</span><span class="p">:</span> <span class="p">[(</span><span class="s1">'sun'</span><span class="p">,</span> <span class="s1">'middle'</span><span class="p">,</span> <span class="s1">'middle'</span><span class="p">,</span> <span class="s1">'yes'</span><span class="p">),</span> <span class="p">(</span><span class="s1">'cloud'</span><span class="p">,</span> <span class="s1">'middle'</span><span class="p">,</span> <span class="s1">'high'</span><span class="p">,</span> <span class="s1">'yes'</span><span class="p">)]</span><span class="err">}}}</span>
<span class="err">进行属性列表裁剪</span><span class="p">:</span> <span class="p">[</span><span class="s1">'weather'</span><span class="p">,</span> <span class="s1">'temperature'</span><span class="p">,</span> <span class="s1">'humidity'</span><span class="p">,</span> <span class="s1">'windy'</span><span class="p">]</span> <span class="n">temperature</span>

<span class="err">当前训练集为</span><span class="p">:</span> <span class="err">{</span><span class="p">(</span><span class="s1">'sun'</span><span class="p">,</span> <span class="s1">'high'</span><span class="p">,</span> <span class="s1">'middle'</span><span class="p">,</span> <span class="s1">'no'</span><span class="p">):</span> <span class="mi">0</span><span class="p">,</span> <span class="p">(</span><span class="s1">'sun'</span><span class="p">,</span> <span class="s1">'high'</span><span class="p">,</span> <span class="s1">'middle'</span><span class="p">,</span> <span class="s1">'yes'</span><span class="p">):</span> <span class="mi">0</span><span class="p">,</span> <span class="p">(</span><span class="s1">'cloud'</span><span class="p">,</span> <span class="s1">'high'</span><span class="p">,</span> <span class="s1">'high'</span><span class="p">,</span> <span class="s1">'no'</span><span class="p">):</span> <span class="mi">1</span><span class="p">,</span> <span class="p">(</span><span class="s1">'rain'</span><span class="p">,</span> <span class="s1">'high'</span><span class="p">,</span> <span class="s1">'high'</span><span class="p">,</span> <span class="s1">'no'</span><span class="p">):</span> <span class="mi">1</span><span class="err">}</span>
<span class="err">当前的属性列表为</span><span class="p">:</span> <span class="p">[</span><span class="s1">'weather'</span><span class="p">,</span> <span class="s1">'humidity'</span><span class="p">,</span> <span class="s1">'windy'</span><span class="p">]</span>
<span class="err">当前训练集的根节点的信息熵</span><span class="p">:</span> <span class="mi">1</span><span class="p">.</span><span class="mi">0</span>
<span class="err">当前训练集的信息熵</span><span class="p">:</span> <span class="err">{</span><span class="s1">'weather'</span><span class="p">:</span> <span class="err">{</span><span class="s1">'sun'</span><span class="p">:</span> <span class="err">{</span><span class="s1">'count'</span><span class="p">:</span> <span class="mi">2</span><span class="p">,</span> <span class="s1">'value'</span><span class="p">:</span> <span class="o">-</span><span class="mi">0</span><span class="p">.</span><span class="mi">0</span><span class="err">}</span><span class="p">,</span> <span class="s1">'cloud'</span><span class="p">:</span> <span class="err">{</span><span class="s1">'count'</span><span class="p">:</span> <span class="mi">1</span><span class="p">,</span> <span class="s1">'value'</span><span class="p">:</span> <span class="o">-</span><span class="mi">0</span><span class="p">.</span><span class="mi">0</span><span class="err">}</span><span class="p">,</span> <span class="s1">'rain'</span><span class="p">:</span> <span class="err">{</span><span class="s1">'count'</span><span class="p">:</span> <span class="mi">1</span><span class="p">,</span> <span class="s1">'value'</span><span class="p">:</span> <span class="o">-</span><span class="mi">0</span><span class="p">.</span><span class="mi">0</span><span class="err">}}</span><span class="p">,</span> <span class="s1">'humidity'</span><span class="p">:</span> <span class="err">{</span><span class="s1">'middle'</span><span class="p">:</span> <span class="err">{</span><span class="s1">'count'</span><span class="p">:</span> <span class="mi">2</span><span class="p">,</span> <span class="s1">'value'</span><span class="p">:</span> <span class="o">-</span><span class="mi">0</span><span class="p">.</span><span class="mi">0</span><span class="err">}</span><span class="p">,</span> <span class="s1">'high'</span><span class="p">:</span> <span class="err">{</span><span class="s1">'count'</span><span class="p">:</span> <span class="mi">2</span><span class="p">,</span> <span class="s1">'value'</span><span class="p">:</span> <span class="o">-</span><span class="mi">0</span><span class="p">.</span><span class="mi">0</span><span class="err">}}</span><span class="p">,</span> <span class="s1">'windy'</span><span class="p">:</span> <span class="err">{</span><span class="s1">'no'</span><span class="p">:</span> <span class="err">{</span><span class="s1">'count'</span><span class="p">:</span> <span class="mi">3</span><span class="p">,</span> <span class="s1">'value'</span><span class="p">:</span> <span class="mi">0</span><span class="p">.</span><span class="mi">9182958340544896</span><span class="err">}</span><span class="p">,</span> <span class="s1">'yes'</span><span class="p">:</span> <span class="err">{</span><span class="s1">'count'</span><span class="p">:</span> <span class="mi">1</span><span class="p">,</span> <span class="s1">'value'</span><span class="p">:</span> <span class="o">-</span><span class="mi">0</span><span class="p">.</span><span class="mi">0</span><span class="err">}}}</span>
<span class="err">当前训练集的归一化信息熵</span><span class="p">:</span> <span class="err">{</span><span class="s1">'weather'</span><span class="p">:</span> <span class="mi">0</span><span class="p">.</span><span class="mi">0</span><span class="p">,</span> <span class="s1">'humidity'</span><span class="p">:</span> <span class="mi">0</span><span class="p">.</span><span class="mi">0</span><span class="p">,</span> <span class="s1">'windy'</span><span class="p">:</span> <span class="mi">0</span><span class="p">.</span><span class="mi">6887218755408672</span><span class="err">}</span>
<span class="err">当前训练集的信息增益</span><span class="p">:</span> <span class="err">{</span><span class="s1">'weather'</span><span class="p">:</span> <span class="mi">1</span><span class="p">.</span><span class="mi">0</span><span class="p">,</span> <span class="s1">'humidity'</span><span class="p">:</span> <span class="mi">1</span><span class="p">.</span><span class="mi">0</span><span class="p">,</span> <span class="s1">'windy'</span><span class="p">:</span> <span class="mi">0</span><span class="p">.</span><span class="mi">31127812445913283</span><span class="err">}</span>
<span class="err">当前选择的最优节点为</span><span class="p">:</span> <span class="err">{</span><span class="s1">'name'</span><span class="p">:</span> <span class="s1">'weather'</span><span class="p">,</span> <span class="s1">'status'</span><span class="p">:</span> <span class="err">{</span><span class="s1">'sun'</span><span class="p">:</span> <span class="err">{</span><span class="s1">'value'</span><span class="p">:</span> <span class="o">-</span><span class="mi">0</span><span class="p">.</span><span class="mi">0</span><span class="p">,</span> <span class="s1">'keys'</span><span class="p">:</span> <span class="p">[(</span><span class="s1">'sun'</span><span class="p">,</span> <span class="s1">'high'</span><span class="p">,</span> <span class="s1">'middle'</span><span class="p">,</span> <span class="s1">'no'</span><span class="p">),</span> <span class="p">(</span><span class="s1">'sun'</span><span class="p">,</span> <span class="s1">'high'</span><span class="p">,</span> <span class="s1">'middle'</span><span class="p">,</span> <span class="s1">'yes'</span><span class="p">)]</span><span class="err">}</span><span class="p">,</span> <span class="s1">'cloud'</span><span class="p">:</span> <span class="err">{</span><span class="s1">'value'</span><span class="p">:</span> <span class="o">-</span><span class="mi">0</span><span class="p">.</span><span class="mi">0</span><span class="p">,</span> <span class="s1">'keys'</span><span class="p">:</span> <span class="p">[(</span><span class="s1">'cloud'</span><span class="p">,</span> <span class="s1">'high'</span><span class="p">,</span> <span class="s1">'high'</span><span class="p">,</span> <span class="s1">'no'</span><span class="p">)]</span><span class="err">}</span><span class="p">,</span> <span class="s1">'rain'</span><span class="p">:</span> <span class="err">{</span><span class="s1">'value'</span><span class="p">:</span> <span class="o">-</span><span class="mi">0</span><span class="p">.</span><span class="mi">0</span><span class="p">,</span> <span class="s1">'keys'</span><span class="p">:</span> <span class="p">[(</span><span class="s1">'rain'</span><span class="p">,</span> <span class="s1">'high'</span><span class="p">,</span> <span class="s1">'high'</span><span class="p">,</span> <span class="s1">'no'</span><span class="p">)]</span><span class="err">}}}</span>
<span class="err">进行属性列表裁剪</span><span class="p">:</span> <span class="p">[</span><span class="s1">'weather'</span><span class="p">,</span> <span class="s1">'humidity'</span><span class="p">,</span> <span class="s1">'windy'</span><span class="p">]</span> <span class="n">weather</span>
<span class="err">进行属性列表裁剪</span><span class="p">:</span> <span class="p">[</span><span class="s1">'weather'</span><span class="p">,</span> <span class="s1">'humidity'</span><span class="p">,</span> <span class="s1">'windy'</span><span class="p">]</span> <span class="n">weather</span>

<span class="err">当前训练集为</span><span class="p">:</span> <span class="err">{</span><span class="p">(</span><span class="s1">'sun'</span><span class="p">,</span> <span class="s1">'middle'</span><span class="p">,</span> <span class="s1">'middle'</span><span class="p">,</span> <span class="s1">'yes'</span><span class="p">):</span> <span class="mi">1</span><span class="p">,</span> <span class="p">(</span><span class="s1">'cloud'</span><span class="p">,</span> <span class="s1">'middle'</span><span class="p">,</span> <span class="s1">'high'</span><span class="p">,</span> <span class="s1">'yes'</span><span class="p">):</span> <span class="mi">0</span><span class="err">}</span>
<span class="err">当前的属性列表为</span><span class="p">:</span> <span class="p">[</span><span class="s1">'humidity'</span><span class="p">,</span> <span class="s1">'windy'</span><span class="p">]</span>
<span class="err">当前训练集的根节点的信息熵</span><span class="p">:</span> <span class="mi">1</span><span class="p">.</span><span class="mi">0</span>
<span class="err">当前训练集的信息熵</span><span class="p">:</span> <span class="err">{</span><span class="s1">'humidity'</span><span class="p">:</span> <span class="err">{</span><span class="s1">'middle'</span><span class="p">:</span> <span class="err">{</span><span class="s1">'count'</span><span class="p">:</span> <span class="mi">1</span><span class="p">,</span> <span class="s1">'value'</span><span class="p">:</span> <span class="o">-</span><span class="mi">0</span><span class="p">.</span><span class="mi">0</span><span class="err">}</span><span class="p">,</span> <span class="s1">'high'</span><span class="p">:</span> <span class="err">{</span><span class="s1">'count'</span><span class="p">:</span> <span class="mi">1</span><span class="p">,</span> <span class="s1">'value'</span><span class="p">:</span> <span class="o">-</span><span class="mi">0</span><span class="p">.</span><span class="mi">0</span><span class="err">}}</span><span class="p">,</span> <span class="s1">'windy'</span><span class="p">:</span> <span class="err">{</span><span class="s1">'yes'</span><span class="p">:</span> <span class="err">{</span><span class="s1">'count'</span><span class="p">:</span> <span class="mi">2</span><span class="p">,</span> <span class="s1">'value'</span><span class="p">:</span> <span class="mi">1</span><span class="p">.</span><span class="mi">0</span><span class="err">}}}</span>
<span class="err">当前训练集的归一化信息熵</span><span class="p">:</span> <span class="err">{</span><span class="s1">'humidity'</span><span class="p">:</span> <span class="mi">0</span><span class="p">.</span><span class="mi">0</span><span class="p">,</span> <span class="s1">'windy'</span><span class="p">:</span> <span class="mi">1</span><span class="p">.</span><span class="mi">0</span><span class="err">}</span>
<span class="err">当前训练集的信息增益</span><span class="p">:</span> <span class="err">{</span><span class="s1">'humidity'</span><span class="p">:</span> <span class="mi">1</span><span class="p">.</span><span class="mi">0</span><span class="p">,</span> <span class="s1">'windy'</span><span class="p">:</span> <span class="mi">0</span><span class="p">.</span><span class="mi">0</span><span class="err">}</span>
<span class="err">当前选择的最优节点为</span><span class="p">:</span> <span class="err">{</span><span class="s1">'name'</span><span class="p">:</span> <span class="s1">'humidity'</span><span class="p">,</span> <span class="s1">'status'</span><span class="p">:</span> <span class="err">{</span><span class="s1">'middle'</span><span class="p">:</span> <span class="err">{</span><span class="s1">'value'</span><span class="p">:</span> <span class="o">-</span><span class="mi">0</span><span class="p">.</span><span class="mi">0</span><span class="p">,</span> <span class="s1">'keys'</span><span class="p">:</span> <span class="p">[(</span><span class="s1">'sun'</span><span class="p">,</span> <span class="s1">'middle'</span><span class="p">,</span> <span class="s1">'middle'</span><span class="p">,</span> <span class="s1">'yes'</span><span class="p">)]</span><span class="err">}</span><span class="p">,</span> <span class="s1">'high'</span><span class="p">:</span> <span class="err">{</span><span class="s1">'value'</span><span class="p">:</span> <span class="o">-</span><span class="mi">0</span><span class="p">.</span><span class="mi">0</span><span class="p">,</span> <span class="s1">'keys'</span><span class="p">:</span> <span class="p">[(</span><span class="s1">'cloud'</span><span class="p">,</span> <span class="s1">'middle'</span><span class="p">,</span> <span class="s1">'high'</span><span class="p">,</span> <span class="s1">'yes'</span><span class="p">)]</span><span class="err">}}}</span>
<span class="err">进行属性列表裁剪</span><span class="p">:</span> <span class="p">[</span><span class="s1">'humidity'</span><span class="p">,</span> <span class="s1">'windy'</span><span class="p">]</span> <span class="n">humidity</span>
<span class="err">进行属性列表裁剪</span><span class="p">:</span> <span class="p">[</span><span class="s1">'humidity'</span><span class="p">,</span> <span class="s1">'windy'</span><span class="p">]</span> <span class="n">humidity</span>
<span class="err">{</span><span class="ss">"temperature"</span><span class="p">:</span> <span class="err">{</span><span class="ss">"high"</span><span class="p">:</span> <span class="err">{</span><span class="ss">"weather"</span><span class="p">:</span> <span class="err">{</span><span class="ss">"sun"</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span> <span class="ss">"cloud"</span><span class="p">:</span> <span class="mi">1</span><span class="p">,</span> <span class="ss">"rain"</span><span class="p">:</span> <span class="mi">1</span><span class="err">}}</span><span class="p">,</span> <span class="ss">"low"</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span> <span class="ss">"middle"</span><span class="p">:</span> <span class="err">{</span><span class="ss">"humidity"</span><span class="p">:</span> <span class="err">{</span><span class="ss">"middle"</span><span class="p">:</span> <span class="mi">1</span><span class="p">,</span> <span class="ss">"high"</span><span class="p">:</span> <span class="mi">0</span><span class="err">}}}}</span>
<span class="n">temperature</span>
    <span class="n">high</span>
        <span class="n">weather</span>
            <span class="n">sun</span> <span class="mi">0</span>
            <span class="n">cloud</span> <span class="mi">1</span>
            <span class="n">rain</span> <span class="mi">1</span>
    <span class="n">low</span> <span class="mi">0</span>
    <span class="n">middle</span>
        <span class="n">humidity</span>
            <span class="n">middle</span> <span class="mi">1</span>
            <span class="n">high</span> <span class="mi">0</span>
</code></pre>
     </div>
    </article>
   </div>
   <!-- Footer -->
   <footer id="footer">
    <p class="copyright">
     © Untitled. Design:
     <a href="https://html5up.net">
      HTML5 UP
     </a>
     .
    </p>
   </footer>
  </div>
  <!-- BG -->
  <div id="bg">
  </div>
  <!-- Scripts -->
  <script src="../assets/js/jquery.min.js">
  </script>
  <script src="../assets/js/browser.min.js">
  </script>
  <script src="../assets/js/breakpoints.min.js">
  </script>
  <script src="../assets/js/util.js">
  </script>
  <script src="../assets/js/main.js">
  </script>
 </body>
</html>
